Dataset statistics
| Number of variables | 7 |
|---|---|
| Number of observations | 217885 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 11.6 MiB |
| Average record size in memory | 56.0 B |
Variable types
| Numeric | 7 |
|---|
df_index has unique values | Unique |
Vacancy_Rate% has 4712 (2.2%) zeros | Zeros |
Reproduction
| Analysis started | 2021-02-23 16:16:34.871326 |
|---|---|
| Analysis finished | 2021-02-23 16:18:59.053777 |
| Duration | 2 minutes and 24.18 seconds |
| Software version | pandas-profiling v2.10.0 |
| Download configuration | config.yaml |
| Distinct | 217885 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 134658.4531 |
|---|---|
| Minimum | 1 |
| Maximum | 264959 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.7 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 13534.2 |
| Q1 | 69303 |
| median | 136264 |
| Q3 | 200994 |
| 95-th percentile | 252198.8 |
| Maximum | 264959 |
| Range | 264958 |
| Interquartile range (IQR) | 131691 |
Descriptive statistics
| Standard deviation | 76336.12375 |
|---|---|
| Coefficient of variation (CV) | 0.5668869793 |
| Kurtosis | -1.193004004 |
| Mean | 134658.4531 |
| Median Absolute Deviation (MAD) | 65859 |
| Skewness | -0.0407734205 |
| Sum | 2.934005706 × 1010 |
| Variance | 5827203789 |
| Monotocity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2047 | 1 | < 0.1% |
| 164357 | 1 | < 0.1% |
| 178690 | 1 | < 0.1% |
| 172545 | 1 | < 0.1% |
| 174592 | 1 | < 0.1% |
| 262613 | 1 | < 0.1% |
| 86399 | 1 | < 0.1% |
| 88446 | 1 | < 0.1% |
| 82301 | 1 | < 0.1% |
| 84348 | 1 | < 0.1% |
| Other values (217875) | 217875 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 |
| Value | Count | Frequency (%) |
| 264959 | 1 | |
| 264958 | 1 | |
| 264957 | 1 | |
| 264956 | 1 | |
| 264955 | 1 |
RentPrice
Real number (ℝ≥0)
| Distinct | 145391 |
|---|---|
| Distinct (%) | 66.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1095.392952 |
|---|---|
| Minimum | 19.96 |
| Maximum | 5620.32 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.7 MiB |
Quantile statistics
| Minimum | 19.96 |
|---|---|
| 5-th percentile | 608.8712 |
| Q1 | 804.206 |
| median | 966.296 |
| Q3 | 1236.036 |
| 95-th percentile | 1994.636 |
| Maximum | 5620.32 |
| Range | 5600.36 |
| Interquartile range (IQR) | 431.83 |
Descriptive statistics
| Standard deviation | 493.4443644 |
|---|---|
| Coefficient of variation (CV) | 0.4504724663 |
| Kurtosis | 12.9915483 |
| Mean | 1095.392952 |
| Median Absolute Deviation (MAD) | 195.706 |
| Skewness | 2.763417001 |
| Sum | 238669693.2 |
| Variance | 243487.3408 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1281.736 | 363 | 0.2% |
| 731.736 | 347 | 0.2% |
| 681.736 | 335 | 0.2% |
| 1006.736 | 313 | 0.1% |
| 631.736 | 307 | 0.1% |
| 831.736 | 291 | 0.1% |
| 781.736 | 285 | 0.1% |
| 1106.736 | 258 | 0.1% |
| 581.736 | 244 | 0.1% |
| 881.736 | 240 | 0.1% |
| Other values (145381) | 214902 |
| Value | Count | Frequency (%) |
| 19.96 | 4 | |
| 94.96 | 4 | |
| 103.29 | 1 | < 0.1% |
| 139.4 | 1 | < 0.1% |
| 144.96 | 6 |
| Value | Count | Frequency (%) |
| 5620.32 | 2 | < 0.1% |
| 5619.795 | 1 | < 0.1% |
| 5616.46 | 3 | < 0.1% |
| 5563.03 | 1 | < 0.1% |
| 5558.206 | 102 |
SizeRank
Real number (ℝ≥0)
| Distinct | 11050 |
|---|---|
| Distinct (%) | 5.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14615.05118 |
|---|---|
| Minimum | 0 |
| Maximum | 34430 |
| Zeros | 8 |
| Zeros (%) | < 0.1% |
| Memory size | 1.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1379.2 |
| Q1 | 6938 |
| median | 14027 |
| Q3 | 21854 |
| 95-th percentile | 29685 |
| Maximum | 34430 |
| Range | 34430 |
| Interquartile range (IQR) | 14916 |
Descriptive statistics
| Standard deviation | 8955.875344 |
|---|---|
| Coefficient of variation (CV) | 0.6127843984 |
| Kurtosis | -1.047220924 |
| Mean | 14615.05118 |
| Median Absolute Deviation (MAD) | 7426 |
| Skewness | 0.1963601593 |
| Sum | 3184400426 |
| Variance | 80207703.18 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 29964 | 230 | 0.1% |
| 28097 | 201 | 0.1% |
| 30545 | 200 | 0.1% |
| 24938 | 186 | 0.1% |
| 27522 | 185 | 0.1% |
| 27365 | 184 | 0.1% |
| 25092 | 183 | 0.1% |
| 32062 | 179 | 0.1% |
| 29685 | 178 | 0.1% |
| 29401 | 177 | 0.1% |
| Other values (11040) | 215982 |
| Value | Count | Frequency (%) |
| 0 | 8 | |
| 1 | 8 | |
| 2 | 8 | |
| 3 | 8 | |
| 4 | 8 |
| Value | Count | Frequency (%) |
| 34430 | 27 | < 0.1% |
| 34322 | 75 | |
| 34302 | 16 | < 0.1% |
| 34258 | 2 | < 0.1% |
| 34247 | 2 | < 0.1% |
HomePrice
Real number (ℝ≥0)
| Distinct | 210975 |
|---|---|
| Distinct (%) | 96.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 185450.5809 |
|---|---|
| Minimum | 10956.33 |
| Maximum | 6141945.92 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.7 MiB |
Quantile statistics
| Minimum | 10956.33 |
|---|---|
| 5-th percentile | 50771.966 |
| Q1 | 88016.67 |
| median | 134667.5 |
| Q3 | 214877 |
| 95-th percentile | 484188.002 |
| Maximum | 6141945.92 |
| Range | 6130989.59 |
| Interquartile range (IQR) | 126860.33 |
Descriptive statistics
| Standard deviation | 185121.166 |
|---|---|
| Coefficient of variation (CV) | 0.9982237056 |
| Kurtosis | 60.3893833 |
| Mean | 185450.5809 |
| Median Absolute Deviation (MAD) | 55752.83 |
| Skewness | 5.483810985 |
| Sum | 4.040689981 × 1010 |
| Variance | 3.426984612 × 1010 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 57537.17 | 4 | < 0.1% |
| 110771.67 | 4 | < 0.1% |
| 54673.67 | 4 | < 0.1% |
| 75169.83 | 4 | < 0.1% |
| 81272 | 4 | < 0.1% |
| 88629 | 4 | < 0.1% |
| 236968.25 | 3 | < 0.1% |
| 85790.42 | 3 | < 0.1% |
| 85846.33 | 3 | < 0.1% |
| 109707.5 | 3 | < 0.1% |
| Other values (210965) | 217849 |
| Value | Count | Frequency (%) |
| 10956.33 | 1 | |
| 11688 | 1 | |
| 11860.83 | 1 | |
| 12041.42 | 1 | |
| 12062.83 | 1 |
| Value | Count | Frequency (%) |
| 6141945.92 | 1 | |
| 5373670.92 | 1 | |
| 4928414.67 | 1 | |
| 4522642.08 | 1 | |
| 4260975 | 1 |
| Distinct | 164679 |
|---|---|
| Distinct (%) | 75.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16.17859668 |
|---|---|
| Minimum | 0 |
| Maximum | 99.83974359 |
| Zeros | 4712 |
| Zeros (%) | 2.2% |
| Memory size | 1.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2.820812062 |
| Q1 | 7 |
| median | 12.02058775 |
| Q3 | 20.33293698 |
| 95-th percentile | 45.93904196 |
| Maximum | 99.83974359 |
| Range | 99.83974359 |
| Interquartile range (IQR) | 13.33293698 |
Descriptive statistics
| Standard deviation | 14.00675651 |
|---|---|
| Coefficient of variation (CV) | 0.865758433 |
| Kurtosis | 4.84757007 |
| Mean | 16.17859668 |
| Median Absolute Deviation (MAD) | 5.952818947 |
| Skewness | 2.002355621 |
| Sum | 3525073.538 |
| Variance | 196.1892279 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 4712 | 2.2% |
| 20 | 179 | 0.1% |
| 25 | 158 | 0.1% |
| 16.66666667 | 158 | 0.1% |
| 14.28571429 | 153 | 0.1% |
| 11.11111111 | 129 | 0.1% |
| 12.5 | 123 | 0.1% |
| 33.33333333 | 118 | 0.1% |
| 10 | 113 | 0.1% |
| 8.333333333 | 96 | < 0.1% |
| Other values (164669) | 211946 |
| Value | Count | Frequency (%) |
| 0 | 4712 | |
| 0.02272727273 | 1 | < 0.1% |
| 0.1114827202 | 1 | < 0.1% |
| 0.1248439451 | 1 | < 0.1% |
| 0.1402524544 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 99.83974359 | 1 | |
| 99.65337955 | 1 | |
| 99.57386364 | 1 | |
| 99.39577039 | 1 | |
| 99.27007299 | 1 |
Zipcode_2
Real number (ℝ≥0)
| Distinct | 99 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 48.33178971 |
|---|---|
| Minimum | 0 |
| Maximum | 99 |
| Zeros | 51 |
| Zeros (%) | < 0.1% |
| Memory size | 1.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 26 |
| median | 48 |
| Q3 | 71 |
| 95-th percentile | 95 |
| Maximum | 99 |
| Range | 99 |
| Interquartile range (IQR) | 45 |
Descriptive statistics
| Standard deviation | 27.47999887 |
|---|---|
| Coefficient of variation (CV) | 0.5685698593 |
| Kurtosis | -1.05133683 |
| Mean | 48.33178971 |
| Median Absolute Deviation (MAD) | 23 |
| Skewness | 0.09513969524 |
| Sum | 10530772 |
| Variance | 755.1503381 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 49 | 3588 | 1.6% |
| 48 | 3541 | 1.6% |
| 12 | 3525 | 1.6% |
| 95 | 3459 | 1.6% |
| 56 | 3421 | 1.6% |
| 54 | 3393 | 1.6% |
| 28 | 3278 | 1.5% |
| 61 | 3229 | 1.5% |
| 98 | 3223 | 1.5% |
| 45 | 3137 | 1.4% |
| Other values (89) | 184091 |
| Value | Count | Frequency (%) |
| 0 | 51 | < 0.1% |
| 1 | 2227 | |
| 2 | 2246 | |
| 3 | 1812 | |
| 4 | 2616 |
| Value | Count | Frequency (%) |
| 99 | 1423 | |
| 98 | 3223 | |
| 97 | 2879 | |
| 96 | 1386 | |
| 95 | 3459 |
Zipcode_3
Real number (ℝ≥0)
| Distinct | 887 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 487.4041306 |
|---|---|
| Minimum | 6 |
| Maximum | 999 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.7 MiB |
Quantile statistics
| Minimum | 6 |
|---|---|
| 5-th percentile | 60 |
| Q1 | 262 |
| median | 483 |
| Q3 | 717 |
| 95-th percentile | 954 |
| Maximum | 999 |
| Range | 993 |
| Interquartile range (IQR) | 455 |
Descriptive statistics
| Standard deviation | 274.6330887 |
|---|---|
| Coefficient of variation (CV) | 0.563460733 |
| Kurtosis | -1.051258348 |
| Mean | 487.4041306 |
| Median Absolute Deviation (MAD) | 228 |
| Skewness | 0.09548093925 |
| Sum | 106198049 |
| Variance | 75423.33342 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 770 | 768 | 0.4% |
| 553 | 688 | 0.3% |
| 170 | 627 | 0.3% |
| 70 | 626 | 0.3% |
| 535 | 618 | 0.3% |
| 945 | 612 | 0.3% |
| 604 | 612 | 0.3% |
| 730 | 610 | 0.3% |
| 458 | 608 | 0.3% |
| 956 | 608 | 0.3% |
| Other values (877) | 211508 |
| Value | Count | Frequency (%) |
| 6 | 32 | < 0.1% |
| 7 | 17 | < 0.1% |
| 9 | 2 | < 0.1% |
| 10 | 440 | |
| 11 | 96 | < 0.1% |
| Value | Count | Frequency (%) |
| 999 | 9 | < 0.1% |
| 998 | 38 | < 0.1% |
| 997 | 42 | < 0.1% |
| 996 | 145 | |
| 995 | 147 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | RentPrice | SizeRank | HomePrice | Vacancy_Rate% | Zipcode_2 | Zipcode_3 | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 1311.076 | 11179.0 | 274920.17 | 3.116343 | 02 | 023 |
| 1 | 2 | 1484.626 | 8621.0 | 415097.50 | 4.464646 | 02 | 023 |
| 2 | 4 | 1524.006 | 9640.0 | 247510.42 | 3.732901 | 02 | 023 |
| 3 | 5 | 1310.016 | 5289.0 | 264492.50 | 7.960256 | 02 | 023 |
| 4 | 6 | 1307.736 | 9579.0 | 309743.67 | 11.565968 | 02 | 023 |
| 5 | 7 | 1399.926 | 7293.0 | 279614.92 | 5.455122 | 02 | 023 |
| 6 | 8 | 1753.956 | 9084.0 | 371979.42 | 2.849920 | 02 | 023 |
| 7 | 10 | 1412.936 | 7427.0 | 316128.33 | 4.690117 | 02 | 023 |
| 8 | 11 | 1551.496 | 264.0 | 302772.08 | 15.666143 | 02 | 023 |
| 9 | 12 | 1850.286 | 8710.0 | 335381.75 | 9.368875 | 02 | 023 |
Last rows
| df_index | RentPrice | SizeRank | HomePrice | Vacancy_Rate% | Zipcode_2 | Zipcode_3 | |
|---|---|---|---|---|---|---|---|
| 217875 | 264947 | 1846.53 | 6680.0 | 720786.92 | 5.024494 | 98 | 981 |
| 217876 | 264948 | 1840.94 | 2689.0 | 766981.50 | 4.885872 | 98 | 981 |
| 217877 | 264949 | 1591.40 | 1122.0 | 592306.58 | 6.737354 | 98 | 981 |
| 217878 | 264950 | 1909.58 | 28159.0 | 438970.00 | 13.580247 | 98 | 981 |
| 217879 | 264953 | 1413.89 | 7640.0 | 317426.75 | 4.853765 | 98 | 982 |
| 217880 | 264955 | 1059.87 | 23400.0 | 552805.42 | 51.219512 | 98 | 982 |
| 217881 | 264956 | 993.85 | 25265.0 | 678499.00 | 51.329243 | 98 | 982 |
| 217882 | 264957 | 1533.50 | 4981.0 | 314320.83 | 6.540162 | 98 | 983 |
| 217883 | 264958 | 778.99 | 26185.0 | 150193.17 | 28.537736 | 98 | 983 |
| 217884 | 264959 | 1840.86 | 6759.0 | 535136.75 | 7.340077 | 98 | 983 |